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Coronaviruses cause a variety of respiratory and enteric diseases in animals and humans including severe 
acute respiratory syndrome. In these enveloped viruses, the filamentous nucleocapsid is formed by the 
association of nucleocapsid (N) protein with single-stranded viral RNA. The N protein is a highly immunogenic 
phosphoprotein also implicated in viral genome replication and in modulating cell signaling pathways. We 
describe the structure of the two proteolytically resistant domains of the N protein from infectious bronchitis 
virus (IBV), a prototype coronavirus. These domains are located at its N- and C-terminal ends (NTD and CTD, 
respectively). The NTD of the IBV Gray strain at 1.3-A resolution exhibits a U-shaped structure, with two arms 
rich in basic residues, providing a module for specific interaction with RNA. The CTD forms a tightly 
intertwined dimer with an intermolecular four-stranded central B-sheet platform flanked by « helices, indi- 
cating that the basic building block for coronavirus nucleocapsid formation is a dimeric assembly of N protein. 
The variety of quaternary arrangements of the NTD and CTD revealed by the analysis of the different crystal 
forms delineates possible interfaces that could be used for the formation of a flexible filamentous ribonucleo- 
capsid. The striking similarity between the dimeric structure of CTD and the nucleocapsid-forming domain of 
a distantly related arterivirus indicates a conserved mechanism of nucleocapsid formation for these two viral 


families. 


Coronaviridae is a family of viruses which are the causative 
agents of human upper respiratory infections including com- 
mon colds, as well as severe illnesses such as severe acute 
respiratory syndrome (SARS). Avian infectious bronchitis vi- 
rus (IBV) is a major source of mortality in chickens worldwide 
and has a significant impact on the poultry industry. Other 
coronaviruses affect domestic animals and are of veterinary 
significance. Coronaviruses are enveloped viruses with a diam- 
eter ranging from 80 to 160 nm. Their viral genome consists of 
positive-sense single-stranded (ss) RNA of approximately 30 
kb (36). The genomic RNA encodes a 3’ coterminal set of four 
or more subgenomic mRNAs with a common leader sequence 
at their 5’ ends. These subgenomic RNA segments encode 
various structural and nonstructural viral proteins that are re- 
quired to produce progeny virions. The viral particle consists of 
a nucleocapsid or core structure surrounded by a lipid enve- 
lope in which the membrane glycoprotein (M) and another 
small transmembrane protein (E) are embedded. A series of 
protrusions composed of glycoproteins (S) anchored in the 
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lipid envelope extend radially, forming up to 20-nm-long spikes 
which give the roughly spherical viral particles a crown (co- 
rona) appearance. 

During the virus life cycle, multiple copies of the nucleocap- 
sid phosphoprotein (N) interact intimately with genomic and 
subgenomic RNA molecules (1, 28) and together with M, the 
most abundant envelope protein, participate in genome con- 
densation and packaging. The N and M proteins interact via 
their C termini, leading to specific genome encapsidation in 
the budding viral particle (19). Electron microscopic studies of 
detergent-permeabilized transmissible gastroenteritis virus 
capsids revealed that the internal ribonucleocapsid is a flexible 
filamentous structure with a diameter of approximately ~10 to 
15 nm and up to several hundred nanometers in length (33, 
34). The highly basic N protein has a molecular mass ranging 
between 45 and 60 kDa in the various groups of coronaviruses 
and, along with its coding RNA, is synthesized in large 
amounts during infection (20, 37). The N protein is able to 
bind ssRNA nonspecifically but displays an increased affinity 
for viral genomic RNA (9). Packaging signals have been iden- 
tified at the 5’ and 3’ termini of the genome for several coro- 
naviruses, but not unambiguously for the IBV genome. Bio- 
chemical studies of murine hepatitis virus (MHV), IBV, and 
SARS coronavirus (SARS-CoV) have mapped the RNA bind- 
ing function to a segment of 55 residues located at the N- 
terminal half of the N protein and the dimerization function to 
its C-terminal half (14, 29, 45). In addition to its structural role, 
the N protein is also implicated in other processes during 
infection including mRNA transcription, replication, and host 
cell modulation (16, 20, 25, 35, 39, 41, 44). The N protein is 
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also an important diagnostic marker for coronavirus disease 
and a major immunogen that can prime protective immune 
responses (23). 

The recombinant N protein of coronavirus expressed in 
Escherichia coli is highly susceptible to proteolysis, making 
structural analysis of the full-length protein difficult. To date, 
there is only limited information on the structure of the ribo- 
nucleocapsid protein, including a nuclear magnetic resonance 
(NMR) analysis of the N-terminal domain of the SARS-CoV 
N protein (18) and our crystallographic study of the N-terminal 
domain of the IBV N protein (Beaudette strain) at 1.85-A 
resolution (14). As yet, there is no X-ray crystallographic struc- 
ture analysis of the C-terminal dimerization domain of a coro- 
navirus N protein. In this paper, we present a structural anal- 
ysis, at high resolution, of the two proteolytically stable 
domains of the IBV N proteins located at the N- and C- 
terminal ends, called NTD and CTD, respectively. The NTD 
structure of IBV (Gray strain), determined to a 1.3-A resolu- 
tion, is similar to the previously published Beaudette strain 
(12) but makes strikingly different quaternary associations. We 
describe the first crystal structure of the CTD dimerization 
domain of the coronavirus N protein at a resolution of 2.2 A. 
Our X-ray crystallographic analysis of the NTD and CTD 
provides insight into the way these modules might interact with 
RNA and with the M protein. The various crystal forms also 
delineate a number of alternative protein surfaces that are 
likely to be used for the formation of a flexible filamentous 
ribonucleocapsid. 


MATERIALS AND METHODS 


Purification of full-length nucleocapsid protein and limited proteolysis. The 
full-length N protein was expressed as described previously (47). The protein was 
further purified by heparin affinity chromatography, concentrated to 1 to 2 mg/ml 
and checked for homogeneity using dynamic light scattering (Dynapro) and 
negative-stain electron microscopy. Limited proteolytic cleavage of the full- 
length N protein (1 to 2 mg/ml) was carried out with 2% (weight trypsin/weight 
protein) sequencing-grade trypsin (Roche) to identify stable domains. The iden- 
tity of the amino termini of the proteolytic product(s) was determined by N- 
terminal amino acid sequencing of the band following gel electrophoresis and 
blotting onto a polyvinylidene fluoride membrane (PVDF-Immobilon-PS2; Mil- 
lipore). For construct optimization the identities of the carboxy-terminal amino 
acids were estimated based on secondary structure prediction and mass spectro- 
metric characterization of the proteolyzed protein. 

Cloning, expression, purification, and crystallization of the tryptic fragments 
of N protein. The NTD and CTD proteins from two strains were employed in this 
study, namely, IBV Gray (CTD1, CTD2, and NTD1) and IBV Beaudette 
(CTD3) (Fig. 3). The proteins were cloned and expressed as glutathione S- 
transferase fusion proteins using the pet41 Ek-LIC vector (Novagen) or, for the 
Beaudette strain, as detailed previously (14) The expressed protein was purified 
using a glutathione S-Sepharose (Pharmacia) affinity column, followed by on- 
bead cleavage with enterokinase (EK-Max, Invitrogen). The cleavage reaction 
was performed by suspending 1 ml of beads in 40 ml of cleavage buffer (250 mM 
NaCl, 50 mM Tris-HCI [pH 8.0]) with 10 units of protease. Following proteolysis, 
the diluted supernatant was further purified by gel filtration chromatography on 
a Superdex 75 16/60 column (Pharmacia). The purified N- and C-terminal do- 
mains were concentrated to 5 to 8 mg/ml for crystallization. 

Data collection and phasing. Diffraction data were collected at various syn- 
chrotron beam lines as indicated in Table 1. For each crystal, images were 
collected using an oscillation angle of 1° and integrated and scaled with 
HKL2000 (31). For the NTD, the diffraction data to 1.3 A were phased using 
molecular replacement (MR) procedures in PHASER (38), using the previously 
published NTD structure (PDB ID, 2BTL) at 2.8-A resolution (14). Following 
MR, further model building and refinement were performed in a similar manner 
to that for the CTD as described below. The CTD crystallized in three crystal 
forms (Table 1). Its structure was determined using selenomethionine (Se-Met)- 
substituted protein with crystal form CTD1 (Table 1) using multiwavelength 
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TABLE 2. Refinement statistics 


Domain (PDB code) 


PDB CTD1 (2GE7) 


PDB CTD2 (2GE8) CTD3 (2CA1) PDB NTD1 (2GEC) 


Resolution range (A) 50-2.0 
Number of reflections 15,110 
Rfactor® 0.238 
Rfree? 0.269 
Mean bond length deviation 0.005 
Mean bond angle deviation 1.305 
Ramachandran statistics 

Residues in most-favored regions (%) 94.2 
Residues in additional allowed regions (%) 5.8 


Residues in generously allowed regions (%) 
Residues in disallowed (poor density) regions (%) 


482.2 20-2.6 50-1.3 
48,564 8,941 60,751 

0.236 0.204 0.210 
0.291 0.256 0.250 
0.006 0.009 0.008 
1.318 1.176 1.235 

91.6 89.8 88.6 

7.6 10.2 10.4 

0.6 0.5 

0.2 0.5 


“ Rfactor = & |lFous — Featcll/® Fobs: 


» Rfree was calculated with 10% of reflections excluded from the whole refinement procedure. 


anomalous dispersion data sets collected at two different wavelengths (Se peak, 
0.9734 A; Se inflection, 0.9748 A). Positions of the four Se atoms were located 
using the SnB program (43) and refined using SHARP (overall figure of merit of 
0.65) (2). An electron density map was calculated following density modification 
using CCP4 (8). An initial model was built using ARP/WARP (21) followed by 
manual model building using Coot (13). A few cycles of simulated annealing 
were performed with CNS (3) followed by model refinement using REFMACS 
(32). The structures of CTDs in the two other crystal forms (CTD2 and CTD3) 
were solved using MR procedures as implemented in PHASER (23). Model bias 
in both NTD and CTD structures was reduced by using the prime-and-switch 
technique implemented in SOLVE/RESOLVE (42). During the course of model 
building and refinement, the stereochemistry of the structures was checked by 
PROCHECK (22). The final statistics are provided in Table 2. Surface electro- 
static potentials were calculated using DELPHI (30). All figures were generated 
using PyMOL (10) and ESPript (15). 


19 162 219 
A 


Protein structure accession numbers. The coordinates for the molecules were 
deposited into the PDB with accession numbers 2GE7 (CTD1), 2GE8 (CTD2), 
2CA1 (CTD3), and 2GEC (NTD1). 


RESULTS 


Identification of two stable independent domains of IBV N 
by limited proteolysis. Since the full-length recombinant pro- 
tein aggregated and was degraded under a variety of experi- 
mental conditions, we sought to identify stable domains that 
were resistant to mild proteolysis. We used limiting amounts of 
trypsin and V8 protease. The digestion pattern with the V8 
protease was not very distinct, yielding several diffuse bands. 


349 


C D 


FIG. 1. Structural domains of the IBV N protein and structure of the NTD RNA binding domain. (A) Schematic diagram showing the major 
(arrow) and minor trypsinization sites (short vertical line) following limited proteolysis of the full-length IBV N protein. The locations of the N- 
and C-terminal domains (NTD spanning residues 19 to 162 and CTD spanning residues 219 to 349) are depicted as black rectangles. (B) Ribbon 
representation of the 1.3-A structure of the NID (Gray strain) asymmetric homodimer (molecules A and B as indicated). The region 
corresponding to the disordered internal arm is depicted in orange. (C) The NTD (Beaudette strain) determined by Fan et al. (14). (D) Elec- 
trostatic potential surface of the linear array of NTD dimers generated by the crystallographic translation. Molecules A and B that constitute the 
dimer are indicated. The N-terminal arm and the region corresponding to the internal arm, rich in basic residues, are indicated by black and cyan 
arrows, respectively. The disordered loop in the B molecule is indicated by a dotted line (see the text). 
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FIG. 2. Structure of the CTD dimerization domain. The left panel shows a ribbon representation of the “front” and “back” of the CTD dimer 
related by a rotation of 180° about the vertical axis, and the right panel shows the electrostatic potential surface of the dimer in the same 
orientations, in which the positively charged surface is represented in shades of blue and the negatively charged surface in shades of red. Left, the 
intertwined CTD dimer is formed by exchanging two 8 strands and one a helix between the two monomers (“domain swapping”). The two 
monomers, in yellow and gray, respectively, are related by a noncrystallographic twofold axis of symmetry approximately perpendicular to the plane 
of the figure. The 8 strands from both monomers form an extended antiparallel B-sheet floor flanked by several a helices. Secondary structural 
elements are labeled. Right, a large patch of positively charged residues (blue) that could be involved in RNA binding is visible on one of the faces 


of the CTD protein (bottom). 


The full-length protein, however, could be cleaved into a “sin- 
gle” stable ~17-kDa band within 15 min of trypsinization. N- 
terminal sequencing allowed the identification of four tryptic frag- 
ments with two major cleavage sites at residues 19 and 219 and 
two secondary cleavage sites at residues 27 and 226 (Fig. 1A). 

New constructs corresponding to these proteolytically resis- 
tant domains termed NTD (residues 19 to 162) and CTD 
(residues 219 to 349) were cloned, expressed, and purified. The 
NTD was monomeric at moderate protein concentrations, 
whereas the CTD was a dimer even at very low concentrations, 
as assayed by gel filtration chromatography (12). The NTD and 
CTD proteins tended to aggregate during the purification pro- 
cedure and thus purified at very low concentrations and con- 
centrated only prior to crystallization. 

NTD and CTD crystallized in multiple crystal forms. The 
recently reported structure of NTD at 1.85-A resolution cor- 
responds to the Beaudette strain of IBV. We used the IBV 
Gray strain which crystallized in a different crystal form and 
diffracts to 1.3-A resolution (Table 1, crystal form NTD1). The 
CTD yielded crystals with needle, rod, flat sheet, or hexagonal 
shapes under various conditions (Table 1, crystal forms CTD1, 
CTD2, and CTD3). Rod-shaped CTD1 crystals of Se-Met sub- 
stituted protein diffracting to 2.0 A were used for structure 
determination. The structures of CTD in the other crystal 
forms (CTD2 at 2.2-A and CTD3 at 2.6-A resolution) were 


determined subsequently by molecular replacement. These 
various crystal forms exhibit different packing arrangements, a 
number of which could mimic the intermolecular interactions 
that trigger the formation of the coronavirus nucleocapsid. 

High-resolution structure of NTD. With the exception of 
five additional residues discernible at its N terminus, the 
present structure of NTD of IBV Gray strain is quite similar to 
the structure of the NTD of IBV Beaudette strain (Fig. 1C) 
(14). Briefly, the NTD monomer features a relatively acidic 
globular core of twisted antiparallel B-sheet surrounded by 
several loop regions. Prominent among the loop regions are 
two long segments corresponding to the N-terminal 12 amino 
acids (residues 22 to 34) and an internal arm spanning residues 
74 to 86. These loops protrude from the globular core resulting 
in a “U”-shaped monomer (Fig. 1B). 

A dimer of NTD subunits is present in the crystallographic 
asymmetric unit: two interlocking NTD monomers are ar- 
ranged in a head-to-tail fashion with the basic arms of one 
monomer interacting with the acidic base of the other mono- 
mer (Fig. 1B). The main structural variation between the two 
NTD monomers related by noncrystallographic symmetry is 
that in one monomer, one of the arms of the “U” (internal 
arm) is disordered. The buried surface area of ~2,150 A 
between the two NTD monomers indicates a rather strong 
interaction. This is in contrast with the previous structure of 
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H-Cov SE.ADSPVKDVF 
MHV » OGADDPTKDVY 


TGEV HKYHL OK 


FIG. 3. Structure-based alignment of coronavirus nucleocapsid amino acid sequences corresponding to the CTD dimerization domain. 
Secondary structure elements are labeled above the sequence for the IBV CTD dimerization domain (this work). Sequences for IBV N proteins 
were obtained from Swiss-Prot (IBV-G [Gray strain], P32923; IBV-B [Beaudette strain], P69596). Sequences for human coronavirus (H-CoV; 
strain HKU1, YP_173242); MHV (strain 1, AAA46439); SARS (SARS-CoV, NCAP_CVHSA) and porcine transmissible gastroenteritis virus 
(TGEY; strain RM4, AAG30228) were obtained from GenBank. Helices (3), and @ helices) are shown as squiggles and B strands as arrows. Boxes 
indicate residues that are fully or partially conserved. Fully conserved residues are shaded in red. Partially conserved residues are indicated by 


salmon-pink letters. 


NTD (14), where the U-shaped monomers were kept parallel 
but which had a much smaller buried surface area of only ~590 
A? (Fig. 1C). 

The structure of CTD is a tightly intertwined dimer. In all 
three crystal forms obtained, the CTD exists as a tightly inter- 
twined twofold symmetric dimer (Fig. 2) with two 8 strands 
and one « helix from one monomer making extensive contacts 
with the other monomer and burying a total surface of approx- 
imately 5,000 A? in their interaction. The distribution of sec- 
ondary structure elements along the CTD amino acid sequence 
and the topology of the CTD dimer are shown in Fig. 3. The 
CTD dimer has a rectangular shape delimited by edges formed 
by the C-terminal a helices a5 (Fig. 2) with approximate di- 
mensions of 40 A x 40 A x 20 A. It features a concave floor 
of ~400-A? area consisting of an antiparallel 8 sheet (B1A- 
82A-82B-81B) contributed by monomers A and B, respec- 
tively, surrounded by several a helices and one short 3,, helix 
(Fig. 2). Helices «3 and a4 are connected by a loop and, 
together with their dimeric partners, form a groove which 
arches inward over this floor constituting the other mostly 
basic face of the CTD dimer. Recent biochemical and mass 
spectrometric studies on the IBV N protein (Beaudette strain) 
have suggested the possibility of disulfide bridges in the CTD 
(5). However, no intramolecular disulfide bridge is seen in the 
CTD dimer of either strain of IBV reported here. The present 
structure of the CTD is consistent with previous observations 
that the CTD is a dimer in solution, with several biochemical 
studies which map the dimerization domain of the full-length 
protein to its C-terminal domain (40, 45). 

Multiple packing modes of CTD dimers. The CTD dimer is 
involved in various intermolecular interactions in the three 
crystal forms reported here. The presence of one dimer (CTD1 
and CTD3) and four dimers in the asymmetric unit of the 
CTD2 crystal form (Tables 1 and 2) permits an analysis of 
dimer-dimer interactions in various conditions of precipitant 
and pH. We focused on interactions with a buried surface area 
larger than 1,000 A. Such an analysis could help in the iden- 
tification of molecular surfaces that are used to assemble the 
nucleocapsid. 


CTD1 dimers (pH, 4.5) which are related by the crystal- 
lographic 2, screw axis display three kinds of intermolecular 
contacts. One interaction (burying ~1,100 A?) brings two 
dimers in a tail-to-tail fashion (referred to as type S) (Fig. 4A). 
Interestingly, in the CTD2 form which crystallized at pH 8.5, a 
similar contact is observed between three of the four crystal- 
lographically independent dimers (Fig. 4B). However, unlike 
in CTD1, where the dimers form an infinitely long linear array, 
a small swivel between the three CTD2 dimers (dimers 1, 2, 
and 3) introduces a slight curvature. In both crystal forms, type 
S interactions are mediated by C-terminal residues located 
between positions 308 and 328, which include « helix a5 and a 
type II turn. A network of water-mediated polar interactions 
and a salt bridge between residues Arg 308 and Asp 314 are 
observed (Fig. 4A, bottom). Secondly, a lateral interaction 
between dimers 2 and 4 (Fig. 4B) mediated by their N-terminal 
residues (221 to 230) buries a comparable surface area of 
~1,250 A? (type L). Dimer 4 extends the helical array formed 
by dimers 1, 2, and 3 in a lateral manner (Fig. 4B [type S’]). 
Finally, in the CTD3 crystal form, dimers form a long helical 
polymer that spirals along the 4; screw axis, with a buried 
surface area of ~1,085 A? (type F). These interactions medi- 
ated by hydrogen bonds between Arg 230 of one monomer and 
carbonyl atoms from residues 263 to 266 in the other monomer 
bear some resemblance to the type L interactions seen in 
CTD1 and CTD2. These contacts result in an infinitely prop- 
agating tube of CTD3 molecules with a diameter of approxi- 
mately 60 A (Fig. 4C and 5B). Interestingly, this arrangement 
of CTD subunits would place the RNA binding N-terminal 
domains towards the interior of the tube and the C-terminal 
domains pointing outside. 


DISCUSSION 


A flexible filamentous nucleocapsid formed by the close 
association of N proteins with viral genomic RNA is a common 
feature in many enveloped ssRNA viruses including coronavi- 
ruses. Structural information on the N protein and a molecular 


AINN WNOZINV NYSHLYON Aq S102 ‘p eunr Uo /Bio"wseIAl//:dyy Woy pepeo|UMOG 


VOL. 80, 2006 


STRUCTURE OF THE CORONAVIRUS NUCLEOCAPSID PROTEIN 6617 


FIG. 4. Crystal-packing interactions between CTD dimers. (A) Crystal-packing interactions in CTD1 crystals grown at pH 4.5, with one dimer 
in the asymmetric unit (ASU). Three consecutive dimers from the neighboring ASU (numbered n, n + 1, and n — 1) related by one of the three 
orthogonal 2, screw axes are shown (type-S interaction). Each monomer has been given a different color. The N- and C-terminal ends for the n + 
1 are indicated. The salt bridge interaction between dimers n and n — 1 seen in the type-S interface is circled. A closeup view of the salt bridge 
interaction with an electron density map is shown in the inset below. (B) Interactions between the four CTD dimers present in the CTD2 ASU 
(pH 8.5). Each dimer is shown in a different color and numbered from 1 to 4. The two classes of dimer-dimer interactions are indicated by S 
(between molecules 1 and 2 and molecules 2 and 3) and L (between molecules 2 and 4). The bridging type-S’ interactions are shown with molecule 
4 from two adjacent ASUs and molecules 1, 2, and 3 from the neighboring ASU (all shown in gray). (C) Dimer-dimer interactions observed in the 
CTD3 crystals having one CTD dimer in the ASU (type F). Each dimer related by the crystallographic 4, screw axis is shown in a different color. 
This type of packing gives rise to hollow tubes, as is evident from a projection along the fiber axis (along the c axis) below shown in red. 


understanding of how this protein facilitates the formation of 
the nucleocapsid are limited. From the biochemical character- 
ization of the N protein of IBV, a prototypical coronavirus, 
presented here, it is apparent that this protein has two major 
protease-resistant domains. Our X-ray crystallographic analy- 
sis of these two domains, NTD and CTD, provides some in- 
sights into how the two-domain organization of the N protein 
may coordinate nucleocapsid assembly. 

Interaction of N with RNA. Biochemical studies (14, 29) 
have located the RNA binding site in the N-terminal domain 
with the minimal region being mapped to residues 177 to 
231 in MHV (corresponding to residues 136 to 190 in IBV). 
In addition to the NTD, an involvement of the CTD in RNA 
binding has been shown by Fan et al. (14). Based on the 
structure of the NTD (IBV Beaudette strain), we proposed 
that the basic arms of the U-shaped monomer participate in 
RNA binding. This hypothesis is consistent with an NMR- 
heteronuclear single quantum coherence analysis of NTD- 
RNA interactions that was carried out for the SARS coro- 


navirus N protein (18). A novel finding from our crystal 
structure analysis is the presence of strong interlocking 
dimers of the NTD (IBV Gray strain). This is in contrast 
with the weak dimeric interaction observed in the NTD of 
the IBV Beaudette strain reported earlier (14). These in- 
terlocking dimers associate to form a linear fiber with the 
basic tethers exposed along the surface. Such fibers could 
provide for closely packed interactions of NTD with the 
viral genomic RNA. Analysis of N protein-RNA interactions 
in MHV at different stages of the virus life cycle revealed 
that these interactions progress from an RNase-sensitive 
complex involving subgenomic RNA to an RNase-resistant 
complex involving genomic RNA (28). The strong and weak 
NTD dimer interactions seen in the two structures could 
correspond to these different states of N protein-RNA as- 
sociations. The electrostatic potential surface of the CTD dimer 
shows one of its faces significantly more basic than the other, with 
a groove lined by a-helices a3 and a4 that could interact with 
RNA (Fig. 2 and 5A and B). 
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FIG. 5. A possible model for helical nucleocapsid formation. (A) Electrostatic potential surface of the CTD polymer formed through S’ 
interactions in the CTD2 crystals. (B) View of the electrostatic potential surface of the assembly formed by the close association of five CTD dimers 
related by the 4, screw axis, from neighboring-unit cells, as observed in the CTD3 crystals. The grooves lined by basic a helices are well exposed 
to the solvent and could thus participate in viral genomic RNA binding. (C) A possible model for the nucleocapsid formation based on the 
protein-protein interfaces observed in the crystallographic structures of the NTD and CTD domains of IBV. The NTD dimers (gray spheres) 
provide specific binding to the viral genomic RNA (black line). Secondary contacts with RNA with polymers of CTD are formed so that together 
the viral genome is condensed in a protective flexible tube. The CTD dimers (shown in red and green) could interact via type-S interface. Changes 
in the curvature and/or the direction of the nucleocapsid filament would derive from the incorporation of another type of isoenergetic interfaces 
(e.g., type-S’, type-L, or type-F interactions). (D) This arrangement is reminiscent of the assemblies formed by the PRRSV capsid-forming domain 


(PDB ID, 1P65) (11). 


Interaction between the N and M proteins. In addition to 
their interactions with RNA, N proteins also interact with the 
M proteins embedded in the viral membrane. Based on reverse 
genetic-complementation assays, the interaction region be- 
tween these two proteins has been mapped to their C termini 
(19). The C terminus of the M protein is significantly basic, and 
recent mutational studies on the M protein have demonstrated 
that its interaction with the N protein is predominantly elec- 
trostatic in nature (26). The exposed acidic B-sheet floor, on 
the opposite side of the proposed RNA-binding region in the 
CTD dimer, may promote such an interaction. Thus, the CTD 
may serve a dual purpose of mediating the self-association of 
the N protein during nucleocapsid formation but also of pro- 
viding a complementary surface for interaction with the 
endodomain of the M protein in the virus envelope. 

Possible model for nucleocapsid formation. The formation 
of the coronavirus nucleocapsid involves self-association of the 
N protein and interaction with RNA resulting in a structure 
which is RNase resistant. Our crystal structure analysis of CTD 


reveals a tight dimer mediated by an exchange of secondary 
structure elements (domain swapping), thus suggesting that 
the full-length N protein also functions as a dimer, with the 
CTD providing a structural scaffold while the NTD mainly 
serves as a module for RNA interaction. The relative orienta- 
tion of the NTD with respect to the CTD in the N protein 
remains unknown, because in the full-length protein these two 
domains are connected by a 47-residue protease-sensitive loop, 
rich in Ser and Gly residues, which is presumably mobile. In 
the filamentous ribonucleocapsid, thanks to the flexibility pro- 
vided by this linker, the RNA binding regions of these two 
domains could face each other, engulfing the RNA between 
them, thus conferring resistance to RNases. 

In the various crystal forms studied, the NTD and CTD 
self-associate in multiple modes with buried surface areas 
greater than 1,000 A. A relevant question is which (if any) of 
these interactions are used in the formation of the nucleocap- 
sid. Both the type-S and -F interactions between CTDs are 
conducive to the formation of fibril structures. Propagation of 
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any single type of interactions, however, would lead to a rigid 
helical nucleocapsid. Considering that the coronavirus nucleo- 
capsid is not a rodlike structure, nucleocapsid assembly may 
thus involve a combination of the various interactions observed 
in our studies. The type-S interface could be used to a greater 
extent given that it occurs over a wider range of pH and seems 
to form regardless of the constraints imposed by crystal-pack- 
ing forces (as in the CTD2 crystals). A combination of the 
type-S interactions with types L and F would modulate the 
curvature of the nucleocapsid in the virion (Fig. 5C). 

Similarity with other coronavirus N proteins. Coronaviruses 
have been classified into four groups, with SARS-CoV being 
the founding member of an independent group. The N protein 
sequences are more similar within each group (~40% identity) 
than across groups (20 to 30% identity). The only X-ray struc- 
ture of a coronavirus N protein available to date is that of the 
IBV NTD protein as described here and by Fan et al. (14). 
Despite very low sequence similarity between IBV and the 
SARS-CoV N proteins, their NTD and CTD structures adopt 
the same general polypeptide fold. This suggests that this fold 
is essentially preserved across the various coronavirus groups. 
This conclusion is also consistent with the NMR structures of 
the N- and C-terminal domains of the SARS N protein re- 
ported recently (4, 18). 

Comparison with the N proteins of other positive-strand 
ssRNA viruses. Crystal structures of nucleocapsid proteins 
from several other positive-strand ssRNA viruses, including 
porcine reproductive and respiratory syndrome virus (PRRSV) 
in the Arteriviridae (11), West Nile virus in the Flaviviridae (12), 
and Sindbis virus and Semliki Forest virus in the Togaviridae 
family (6, 7), have been reported. The C-terminal domains of 
nucleoproteins from Sindbis virus and Semliki Forest virus 
adopt a chymotrypsinlike B-barrel fold. Core proteins from 
West Nile and dengue virus are dimers composed entirely of 
a-helical bundles. As indicated by a systematic structural ho- 
mology search (17), the coronavirus N protein CTD closely 
resembles the nucleocapsid protein of PRRSV. The PRRSV 
capsid-forming domain (C-terminal 73 to 123 amino acid res- 
idues) has a similar dimeric structure and exhibits self-associ- 
ation mediated by a salt bridge as seen in the IBV CTD (11) 
(Fig. 5D). Although a “domain-swapping” mode of oligomer- 
ization has been observed in a variety of proteins which are 
known to self-aggregate (24), the nature of domain swapping 
observed in these two viral proteins appears to be unique. 

Interestingly, unlike the CTD fold which is shared with the 
arterivirus PRRSV, the fold of the NTD is observed only in the 
coronavirus N proteins (14). The corresponding basic N-ter- 
minal RNA binding domain in the much shorter PRRSV N 
protein appears to be largely disordered (11). Thus, the ob- 
served structural similarity between the N proteins of IBV and 
PRRSV suggests that members of the Coronaviridae and 
Arteriviridae families (order Nidovirales) share a common mech- 
anism of filamentous nucleocapsid formation with suitable al- 
terations necessary to interact specifically with their respective 
genomes. Conversely, the structural differences with other nu- 
cleocapsid proteins from ssRNA enveloped viruses, such as 
flaviviruses and togaviruses, probably reflect variations in their 
replication strategies and assembly pathways. Indeed, flavivi- 
ruses and togaviruses exhibit icosahedrally symmetric exteriors 
and are not pleomorphic like coronaviruses (27, 46). One com- 
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mon feature in the nucleocapsid proteins across positive-strand 
ssRNA viruses, however, appears to be the partitioning of the 
nucleocapsid protein structure into two domains: one forming 
a protective scaffold around the RNA through self-association 
and the other providing specific interactions with the viral 
genome. Electron-microscopic studies of ribonucleocapsid as- 
semblies at higher resolution are now needed in order to gain 
further insights into the molecular mechanism of nucleocapsid 
formation for coronaviruses. However, we hope that the 
atomic description of the N protein provided here will stimu- 
late such studies and also the design of molecules that could 
disrupt viral assembly. 
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